Augmentation of a Term/Document Matrix with Part-of- Speech Tags to Improve Accuracy of Latent Semantic Analysis
نویسندگان
چکیده
We consider the improvement in accuracy of latent semantic analysis when a part of speech tagger is used to augment a term/document matrix. We first construct an augmented term/document matrix as input into singular value decomposition (SVD). The singular values then serve as principal components for a cosine projection. The results show that the addition of POS tags can decrease ambiguities significantly. Key-Words: Latent Semantic Analysis, Documents, Tags, Singular Value Decomposition
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملA Hybrid Method of Syntactic Feature and Latent Semantic Analysis for Automatic Arabic Essay Scoring
Background: The process of automated essays assessments is a challenging task due to the need of comprehensive evaluation in order to validate the answers accurately. The challenge increases when dealing with Arabic language where, morphology, semantic and syntactic are complex. Methodology: There are few research efforts have been proposed for Automatic Essays Scoring (AES) in Arabic. However,...
متن کاملApplying Part-of-Seech Enhanced LSA to Automatic Essay Grading
Latent Semantic Analysis (LSA) is a widely used Information Retrieval method based on " bag-of-words " assumption. However, according to general conception, syntax plays a role in representing meaning of sentences. Thus, enhancing LSA with part-of-speech (POS) information to capture the context of word occurrences appears to be theoretically feasible extension. The approach is tested empiricall...
متن کاملAn Investigation of Recursive Auto-associative Memory in Sentiment Detection
The rise of blogs, forums, social networks and review websites in recent years has provided very accessible and convenient platforms for people to express thoughts, views or attitudes about topics of interest. In order to collect and analyse opinionated content on the Internet, various sentiment detection techniques have been developed based on an integration of part-of-speech tagging, negation...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کامل